AITopics | validation question

Collaborating Authors

validation question

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

When Surveys Become Conversations: Adaptive Matrix Validation for AI-Assisted Interviews

McCormick, Tyler H.

arXiv.org Machine LearningJun-24-2026

AI-assisted interviews promise to reduce respondent burden in surveys by allowing respondents to describe experiences naturally while an AI system noisily maps those accounts into structured survey variables. That mapping is a measurement process that is fallible, versioned, adaptive, and potentially behaves differently across subgroups. This paper proposes Adaptive Matrix Validation (AMV), a design in which each respondent completes an AI-assisted interview, which is then mapped into tabular data by the AI. Respondents are also asked a small, randomized set of structured questions, which are used for statistical adjustment. The estimator first calibrates the mapped values using validation answers from other respondents, then corrects the remaining error with the validation answers observed for the target respondent. The paper develops estimators for item means, subgroup estimates, and regression coefficients when outcomes, predictors, or both are mapped from interviews. It also gives planning formulas the number of validation questions required and the sample size. A design-calibration simulation, an American Time Use Survey emulation, and a CHAMPS verbal-autopsy narrative study show when sparse validation can improve precision and when it cannot

machine learning, natural language, validation question, (17 more...)

arXiv.org Machine Learning

2606.24244

Country:

North America > United States (1.00)
Africa (0.67)

Genre:

Questionnaire & Opinion Survey (1.00)
Overview (1.00)
Personal > Interview (0.68)
Research Report > New Finding (0.66)

Industry:

Government > Regional Government > North America Government > United States Government (0.68)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification

Kang, Haoqiang, Ni, Juntong, Yao, Huaxiu

arXiv.org Artificial IntelligenceNov-15-2023

Large Language Models (LLMs) have demonstrated remarkable proficiency in generating fluent text. However, they often encounter the challenge of generating inaccurate or hallucinated content. This issue is common in both non-retrieval-based generation and retrieval-augmented generation approaches, and existing post-hoc rectification methods may not address the accumulated hallucination errors that may be caused by the "snowballing" issue, especially in reasoning tasks. To tackle these challenges, we introduce a novel approach called Real-time Verification and Rectification (Ever). Instead of waiting until the end of the generation process to rectify hallucinations, Ever employs a real-time, step-wise generation and hallucination rectification strategy. The primary objective is to detect and rectify hallucinations as they occur during the text generation process. When compared to both retrieval-based and non-retrieval-based baselines, Ever demonstrates a significant improvement in generating trustworthy and factually accurate text across a diverse range of tasks, including short-form QA, biography generation, and multi-hop reasoning.

hallucination, preprint arxiv, ver, (16 more...)

arXiv.org Artificial Intelligence

2311.09114

Country:

North America > Mexico (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Virginia > Fairfax County > McLean (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

Varshney, Neeraj, Yao, Wenlin, Zhang, Hongming, Chen, Jianshu, Yu, Dong

arXiv.org Artificial IntelligenceAug-12-2023

Recently developed large language models have achieved remarkable success in generating fluent and coherent text. However, these models often tend to 'hallucinate' which critically hampers their reliability. In this work, we address this crucial problem and propose an approach that actively detects and mitigates hallucinations during the generation process. Specifically, we first identify the candidates of potential hallucination leveraging the model's logit output values, check their correctness through a validation procedure, mitigate the detected hallucinations, and then continue with the generation process. Through extensive experiments with GPT-3.5 (text-davinci-003) on the 'article generation task', we first demonstrate the individual efficacy of our detection and mitigation techniques. Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Importantly, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Then, we show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average. We further demonstrate the effectiveness and wide applicability of our approach through additional studies including performance on different types of questions (multi-hop and false premise questions) and with another LLM from a different model family (Vicuna). In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.03987

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Hawaii (0.04)
(23 more...)

Genre:

Research Report (1.00)
Workflow (0.66)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback